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ABSTRACT 

infortriatioti about the gettome and the " ^^"^^ ^s, 595 metabolic reactions that 

3030 genes of E.coli, 695 .^^^ ; into 123 metabolic pathways. The EcoCyc 

occur in E.colU ancl the "--f"'^'''!""" explore the EcoCyc database ustng 

graphical user interface allocs --'"J^^ / „ j ,„'tomatic layouts of metabolic pathw ay . 

visualisation tools such as S^'""-"'^-"^' ^ "^^^J .^^'.^icle because of its copious references^to the 

lvniOlH^Zli2>L 

known genes ol /:.c^>//- ^^^"^ ^" 



reactions calal\/cd b\ each cn/ymc. and the organi/alion of these reactions into metabolic palh\va> s. 
h:co(\\c can he viewed as an electronic re\ iew article because it is a careful]} sifted collection of 
information drawn largels from (aiul containini! K)5() citalit)ns to) the primar\ literature. I he i:co(;>c 
L^raphical user interface (dl d) allows scientists to c|uer\. explore, and visuali/e the i;coC\c I)B. i;co(;\c 
integrates genomic and functional data to allow scientists to in\estigale a broad range of questions (4).' 

Among the problems that might be addressed using 1 coC^nc are the following (some of these tasks are 
not dn-ectly supported by the licoCyc user interlace and would require additional programming). 

• hcoC;\ c is a resource for anal \ sis of microbial genomes, f or example. HcoC yc has been used to 
predict the metabolic pathwa\s oH I.injlucnia (^) and oU/. pylori 

• Because of its links to sequence DI^s such as Sw iss-l^rot. EcoCnc can be used to perform 
function-based retrie\a! of DNA or protein sequences, for example to prepare datasets for studies 
of protein structure-function relationships. 

• Scientists who study the evolution of metabolism can use L:coC\'c to search out examples ot 
duplication and di\ ergence of en/ymes and pathwa\ s. 

• EcoCyc provides a foundation for performing simulations of the metabolism, although it currentl\ 
lacks the kinetics data used b\ most simulation techniques. 

• 1 he DH has been used as an aid in teaching biochemistr\'. 

I his article describes recent enhancements to licoL\x and how to access l:coC\c. W'e request that usei's 
ollZcoCyc cite this article in publications related to its use. 

RECENT ENHANCEMENTS 

hHhe past } ear we supplemented the HcoCnc data with the following 13 new pathways: glutamine 
utilization: 1-serine degradation: glutamate utilization: l-cysteine catabolism: tryptophan utilization: 
2-phenylethylamine degradation: enterobactin s\ nthesis: aerobic electron transfer: anaerobic electron 
transfer: carnitine metabolism. CoA-linked: carnitine metabolism: pyridine nucleotide c\'clino: 
nucleotide metabolism. 



We have reorganized the Overview diagram of the K.coli metabolic map. The Ox erview is now a\ ailable 
through the WWW at http://ecocvc.PaneeaS\-stems.com/ecoc\c^\ .html . and is shown in l^gure i. 1 he 
new^ organization rellects the new pathwa\'s added to HcoCyc in the past > ear. and also reflec^ts a new 
organizing principle: anabolic pathw ax s are draw n on the left side of the diagram, catabolic pathw a\ s are 
drawn on the right side, and energ\ -producing pathwavs are drawn in the middle. (Because some 
metabolic reactions perform more than one role under dilTerent metabolic circumstances, we made a 
choice as to the primar>' role.) EcoCAc pro\ ides several queries that operate on the Over\ iew. Users can 
highlight objects in the 0\er\iew\ such as finding compounds b\' name or substring, finding reactions b\ 
EC number, or finding enzNmes b\ name or substring. Lasers can also highlight enzymes according to 
their sensitivit\' to metabolites, such as all enz\ nies that are inhibited by ATP or that are acti\'ated\\' 
l-lactate. b:co(\\c maintains a historx of the highlighting operations, so users can undo or redo their' 
highlighting queries. Highlighting operations are not supported in the \\A\AV version of Ix'oCAc. 
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Fisure 1 (luidc to the V.coCyc schema. This vltsioh of the ()\ cia icw diagram 
shows a comparison hclwocn the full melaboHc network on'. coli and the 
predicted melabohc network oUiinfhicnza. kaeh cncie represems a single 
melabohte. luich blue line represents a metabolic reaction that occurs in Exnli 
onl\: each green line represents a reaction thai occurs in both E coli and 
ll iiifliicnza. Vvdch gre\ line connects two dots that represent the same metabolite. 

\ new x isuali/.ation w ithin l-coC'vc pathway displax s shou s the distribution of genes that cmodc the 
en/ymes of a palhwav. The \isuaiizalion consists ofa small circle represenlmg the chromosome with 
lick' marks drawn for'each uene in the pathuax . When the user mcnes the mouse pointer ov er a tick 
mark. l.coCyc Hashes the name oi"the gene, and higiilights the arrow uithin the pathway draumg tor the 
reaction! s) that are catal\ /ed by the gene product. 

We ha^e also enhanced pathwax \ isuali/ations to depict poh merizalion steps (see 1-ig. 2). We use a 
dashed line to indicate that two 'compound names are. in certain situations, meant to represent the same 
chemical species. For example, most textbooks depict saturated fatty acid elongation as a spiral, w here 
each turn of the spiral adds tw o carbons to the backbone. Our representation shou s the pathwa\ as a 
cycle, usins: ueneric rather than spceine names for the compounds involved. .-\t the Txgmnmg' ol the 
c'xcle is acCly-ACP. which undergoes sev eral reactions producing ac\ l.\-2-/\CP. A dashed line is dra\sn 
between these two names to indicate that the A - 2 species becomes the .X' species for the next iteration of 
the c>'cle. We also use the dashed line when showing equivalence between a specilic name lor a 
compound (such as a starting or ending compound for a series of poh merizalion reactions) and the 
generic form. Using this scheme, we can compacth represent polymerization pathwav s as cv cles ol 
generic compounds, with specific compounds as inputs and or outputs. 
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EcoCvc now contains descriptions of 79 iRNAs. Hach iRNA is represented as a distinct object within the 
DB. and is linked to the HcoCvc obiect that represents the gene for the tllNA. 33 iRNA synthetases, and 
the associated chareinu reactions, are also encoded as licoCvc objects, where the tRNA objects are 
substrates in these reactions. Additional substrates include the charged iRNAs. which are also 
represented as distinct objects within the DB. 

The reactions of two-componenl signal transduction sv slems in E.coli have been added to licoCyc. 
About 22 signal transduction systems are in E.coli inxolving at least two gene products each, I hese 
types of regulator)' reactions hav e counterparts in eukarv otic organisms and are in this sense 
housekeeping functions with ancient common ancestors. 

The two components, the sensor protein and the response regulator protein, interact to convert an 
environmental signal (either internal cm- external ) into regulation of relevant gene expression. Although 
the sv slems diffcT. ueneralK the sensor protein becomes phosphorv lated when stimulated bv a specilie 
condition such as lack of oxveen or shortage of nitrogen. The phosphorylaled sensor iranslers the 
phosphate to a res-ulator protein, which then transfers the phosphate to another of its ammo acid residues 
internallv . accompanied bv an allosteric change of the regulator protein. The altered regulator is then 
active as a transcriptional activator. Although signal transduction svstems ot Lcoli exhibit broad 
similarities, there are at least three classes with dilTcrent modes of action correspoiKling to ditlerent 



,„„o .cul scuucncc doman.s. The reactions of, hose s> stems often lake the form of a 

uentnl events The ivaetions are. in some sense, in the reahii ol maeromoleeule nietaboh.sm. b .ans. 
: e b a s,rates and prodr.ets n. reaet.ons that n.odity the co^■alent eompos.fon o .e 

protelirs. We represent the lunet.ons of two-contponent signal transdt.et.ons. sueh as phosphor> h.t.on 

e\ cnls. as reactions. 

,■coC^■e reactions are now linked to three other n.etaboHc DBs: 1-N7.YM1-: (i). WTl (H.). and lagand. 
Table'isummari/es all the DBs to uhich IxoCnc is linked. 

lU the tune this article appears, ue expect that data from the lull genomic sequence ot7-.a,/, ( 2.) -iH be 
!!• .o;;l;;;;inm:^:coC;yc' we plan to create l-:coC>c ob,ects ior all Ecol, genes, and to u.corporate tlK 
map positions determined the Blaltner group. 

Table 1. The biological DBs lo ^M^ objects in ditlerent HcoCyc classes are linked, e.g.. HcoCvc genes 
are linked to the CGSC DB and to GenBank 

K'lass iLinked databases 

Cones C oh (ienelic Stock (\'nter. (ienBank 

d>olypeptides:Hxpasy Su.ss-Prot. NClM Suiss-Prol. I'DB. Swiss-Model 

iCilalions iPubMcd 

iReactions lENZYMli. Wl T. Ligand 



THE EcoCvc (;RAPmCAL I SER INTER FACK 

The FcoCvc Gi l (^^) provides graphical tools lor Msuali/.ing and navigating through an -n'^.^rated 
1 ectu n of metabo c and ue. omic informat.on (its retnexal capabilities are described in rel. S). 1 o 
t p 0^^^ in the LcoCvc DB. the ClJl p.xn ides a corresponding ^■,suahzaUon tool 

T^^^^ Mools cb nticallv c uerv the underlying DB. Most display algorithms are parameterize o alio. 
heTser to select the vi;uil presentation of an object that is most informative, l or example, he 
to^^^l^^ucc automatic la> outs of metabolic pathways can suppress the ^-P^^y^^^^^^^ 
;;^"es or side-compound names: they can also draw chemical structures for the compounds w ith.n a 
pathwax-. More details on the dispkn algorithms can be lound in ret. 2. 

THF FcoCvc DATA 

The FcoCnc data are stored within a trame knowledge representation system (ITIS) called O^^^.^^^- •'R^^-^ 
^^:^ectStnted data model. FRSs organize ml.rmation within ^f^!:^];;;;^^;;^^:^:^ 
share similar properties and attributes. Table 2 shows the current size ol seve.al LcoCvc classes, 
statistics pertain to i:coC'yc version .vX. 

The current scope of metabolic information u ithin f coC yc is intermedial-) metabolism onl> : bcoCyc 
!^:Z^o::Z^Lolc.uW metaboUsm such as DN.X replication or repair, nor transcription, no, 
translation, it does describe tRN.A charging. 

For more information on the contents of licoC yc and the data validation procedures ^^^^^'^J^^ 
8- the FcoCnc scliema is defined ,n ref. 7. Fhe retriexal operations supported b> the DB aic dcscidxd 
refs 3 and 8. The FcoC\c softuare architecture is described m rel. 6. 



Table 2. The niinihcr ofobiocls in several I'CoCxc classes 
Reaelions 5^).^ 
l-n/ymcs fi^^ 
Pailnvays 123 
(ienes ?f^2>() 
iRNAs 79 
Compounds 12*^)(> 
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